AD-A200  198 


iiX  UUiElf 


ROLE  OF 

RETINOCORTICAL  PROCESSING 
IN  SPATIAL  VISION 


AFOSR-TK. 


8  8 ‘  0  806 


Annual  Report  1 


June  1988 


By:  D.H.  Kelly,  Staff  Scientist 

Sensory  Sciences  Research  Laboratory 


Prepared  for: 

United  States  Air  Force 

Air  Force  Office  of  Scientific  Research 

Directorate  of  Life  Sciences 

Building  410 

Bolling  Air  Force  Base 

Washington,  D.C.  20332-6448 

Attn:  Dr.  John  F.  Tangney 
Contract  F49620-87-K-0009 


SRI  Project  3558 


OT\C 


E 


D 


SRI  International 

333  Ravenswood  Avenue 

Menlo  Park,  California  94025-3493 

(415)  326-6200 

Telex:  334486 


. .  .•  Imm  b—  eppiovrJ  1 

r«lMM  Mtai  f  ^  j 

■  I  I  I 


88  8  ^5  1 o  6  H 


UNCLASSIFIED 


URITY  CLASSIFICATION  OF  THI 


1«.  REPORT  SECURITY  CLASSIFICATION 
Unclassifed 


2a.  SECURITY  CLASSIFICATION  AUTHORITY 


2b.  DECLASSIFICATION /DOWNGRADING  SCHEDULE 


REPORT  DOCUMENTATION  PAGE 


lb  RESTRiaiVE  MARKINGS 


form  Approved 
OMBNo.  0704-018$ 


3.  DISTRIBUTION /AVAILABILITY  OF  REPORT 
Unlimited 


4.  PERFORMING  ORGANI2ATION  REPORT  NUMBER(S) 
Annual  Report  1 

SRJ  Project 


6a.  NAME  OF  PERFORMING  ORGANIZATION 


6b.  OFFICE  SYMBOL 
(If  tppiksbit) 


SRI  International 


6c  ADDRESS  {City.  State,  and  Z/PCodej 
333  Ravenswood  Avenue 
Menlo  Park,  California  94025 


8a.  NAME  OF  FUNDING /SPONSORING 

organization 

Air  Force  Office  of  Scientific  Research 


8c  ADDRESS  fCrty,  State,  and  7/P  CodbJ 

Directorate  of  Life  Sciences 
Building  410 

Bolling  AFB,  D.C.  20332-6448 


1 1 .  TITLE  (Inc/ude  Security  Cla%sificatior>) 

Role  of  Retinocortical  Processing  in  Spatial  Vision 


8b  OFFICE  SYMBOL 
(If  applicable) 

NL 


5.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 

0)iR-TR.  8  8-08 


7a.  NAME  OF  MONITORING  ORGANIZATION 

AFOSR/NL 


7b.  State,  and  ZIP  Code) 

Bolling  AFB, DC  20332-6448 


9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 


F49620-87-K-0009 


to  SOURCE  OF  FUNDING  NUMBERS 


PROGRAM 
ELEMENT  NO. 

61102F 


PROJECT 

TASK 

NO. 

NO 

2313 

A5 

12.  PERSONAL  AUTHOR(S) 
Kelly,  Donald  H. 


13a.  TYPE  OF  REPORT 
Annual 


13b.  TIME  COVERED 
FROM  870501  TO  880501 


14.  DATE  OF  REPORT  (Year,  Month,  Day)  115.  PAGE  COUNT 
1988  June  |  17 


FIELD 

GROUP 

6 

16 

COSATI  COOES 


SUB-GROUP 


18.  SUBiECT  TERMS  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 
Spatial  vision,  retinocortical  projection,  computational  model 


19  ABSTRACT  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 

^Several  important  image-processing  functions  have  been  proposed  for  the  geometric  distortion  known  as  Cortical  (or  more  precisely, 
retinocortical)  magnificatioii.’‘-This  spatial  distortion  can  convert  the  radial  velocities  projected  on  the  retina  by  egocentric  motion 
uito  uniform,  rectilinear  motion  at  the  cortex.  It  can  also  convert  changes  of  size  and  orientation  in  retinal  coordinates  into  mere  trans¬ 
lation  at  the  cortex.  (In  both  cases,  an  image-like  property  is  converted  into  a  map-like  property.) 

Thus  cortical  magnification  must  play  an  essential  role  in  forming  our  stable  percepts  of  the  world  around  us,  even  as  it  vexes  the  question 
of  how  the  information  from  different  fixations  within  the  same  scene  can  be  arranged  into  a  single  percept.  Is  the  cortical  image  sub¬ 
sequently  ‘^lndistorted,*’^just  to  facilitate  the  superposition  of  multiple  fixations?  An  understanding  of  the  image-coding  functions  of 
the  primary  visual  cortex  (VI )  should  help  to  unravel  this  paradox. 

Using  the  tools  of  computer  vision  (LISP  algorithms  developed  on  Symbolics  networks),  we  are  attempting  to  build  a  working  model  that 
includes  such  processes  as.  fixational  eye  movements,  retinal  filtering  and  inhomogeneity,  retinocortical  mapping,  cortical  (Gabor)  image¬ 
coding,  and  other  processes  involved  in  the  coordinate  shifts  needed  for  mapping  purposes,..Qm  goal  is  to  understand  as  much  as  possible 
about  the  roles  of  these  early  visual  processes  in  forming  a  stable  percept  of  the  world  aroundusiX.^^  C  t{ 


2  T  .  ABSTRACT  SECURITY  CLASSIFICATION 

Unclassified  ^ _ 


22b  TELEPHONE  (Include  Area  Code)  22c.  OFFICE  SYMBOL 

(202)767-5021  _ NL _ 


D  Form  1473.  JUN  86  Previous  editions  are  obsolete.  SECURITY  CLASSIFICATION  OF  THIS  PAGE 

MjI  .j’lJL  ^3^13 


0  DISTRIBUTION /AVAILABILITY  OF  ABSTRACT 
IS  UNCLASSIFIED/UNLIMITED  □  SAME  AS  RPT. 


2a  NAME  OF  RESPONSIBLE  INDIVIDUAL 
Dr.  John  Tangney 


□  DTIC  USERS 


Previous  editions  are  obsolete. 


UNCLASSIFIED 


CONTENTS 


LIST  OF  ILLUSTRATIONS .  iii 

I  RESEARCH  OBJECTIVES .  1 

II  CURRENT  STATUS  OF  WORK .  3 

A.  High-Resolution  Input  Images .  3 

B.  Generality  of  the  Model .  3 

C.  Continuous  Approximations .  3 

1.  Inhomogeneous  Spatial  Filtering .  4 

2.  Temporal  Effects .  4 

3.  Multiple  Fixations .  9 

D.  Retinocortical  Mapping  Functions .  11 

E.  Retinal  Ganglion  Cell  Spacing  and  Receptive-Fidd  Sizes .  13 

F.  Summary  and  Future  Plans .  13 

m  RESEARCH  PERSONNEL .  15 

IV  INTERACTIONS  WITH  SCIENTIHC  COMMUNITY .  16 

REFERENCES .  17 


Accession  For 


J 

□ 


Nils  GRAAI 
DTIC  TAB 
Unannounced 
Justlfloatloa. 


By - 

Distribution/ 


Availability  Codes 


lAva?!  and/or 


ILLUSTRATIONS 


1  Stages  of  the  Visual  Process  Considered  in  the  Present  Study .  2 

2  Interior  Scene:  Original  and  an  Example  of  Inhomogeneous  Filtering .  5 

3  Effea  of  Bandpass  (Laplacim)  Rlter  on  the  Scene  Shown  in  Figure  2 .  7 

4  Mixture  of  Low-Pass  (20%)  and  Bandpass  (80%)  Inhomogeneous  Filtering .  8 

5  Combination  of  Figure  4  with  Two  Other  Inhomogeneously  Filtered  Images, 

Representing  Three  Different  Fixations  in  the  Same  Scene .  10 

6  Example  of  a  Distorted  Cortical  Representation, 

Based  on  the  Same  Scene  and  Fixation  Point  as  Figure  2(a) .  12 


I  RESEARCH  OBJECTIVES 


Our  goal  is  to  develop  a  computational  model  of  the  “front-end”  stages  of  human  spatial 
vision,  including  the  retina,  retinocortical  pathways,  and  primary  visual  cortex  VI,  as  illustrated 
schematically  in  Figure  1.  This  computational  product  will  be  a  functional,  working  model, 
which  processes  the  entire  stimulus  pattern  by  appropriate  algorithms  and  which  can  depict  its 
representation  at  each  stage  in  graiMc  imagery. 

To  make  this  task  more  manageable,  important  but  noncritical  simplifications  must  be 
made.  The  model  is  confined  to  monocular,  photopic,  achromatic,  quasi-stationary  vision. 
Motion  is  considered  only  to  the  extent  that  normal  spatial  processing  requires  minimal  eye 
movements.  Binocularity  is  considered  only  by  constraining  VI  to  leave  room  for  interleaved 
right-  and  left-eye  connections. 

Important  parts  of  this  complex  system  have  been  modeled  in  other  studies.  Our  main 
goal  is  to  try  to  make  them  all  fit  together.  In  doing  that,  we  expected  to  encounter  problems 
that  have  not  shown  up  before  (and  this  is  beginning  to  occur).  Where  these  parts,  as  currently 
modeled,  caimot  be  fitted  together,  we  will  try  to  leam  why.  To  this  end,  we  may  modify 
available  models,  reinterpreting  the  literature  on  which  they  are  based.  If  need  be,  we  will  try 
to  devise  crucial  experiments  to  selea  among  alternatives. 


1 


FIGURE  1  STAGES  IN  THE  VISUAL  PROCESS  CONSIDERED 
IN  THE  PRESENT  STUDY 

We  will  attempt  to  provide  representations  of  visual 
information  at  least  through  the  cortical  output. 


2 


n  CURRENT  STATUS  OF  WORK 


A.  High-Resolution  Input  Images 

To  simulate  the  “real  world,”  we  prepared  several  100  x  125  mm  (4x5  inch),  fine-grain 
photographs  of  both  interior  and  exterior  subjects,  taken  with  a  90-mm  lens.  The  original 
negatives  were  scanned  directly  on  a  recently  calibrated  Optronics  scanner.  These  images  were 
linearized  with  respect  to  luminance  and  stored  as  a  digitized  (pixel)  array.  Digital  images 
4000  X  4000  pixels  in  extent,  with  8  bits  of  gray  scale,  are  readily  produced  by  this  procedure. 
The  current  small  library  of  images  can  be  expanded  as  needed. 

B.  Generality  of  the  Model 

In  keeping  with  the  exploratory  nature  of  this  project,  our  software  is  designed  to  be  as 
modular  and  flexible  as  possible.  In  some  of  the  work  reported  here,  for  example,  the  form  of 
our  ganglion-cell  receptive  fields  was  based  on  the  laplacian  of  a  gaussian  (Marr  and  Hildreth, 
1980):  however,  we  can  readily  substitute  any  other  plausible  kernel  for  retinal  filtering.  A 
number  of  these  functions  have  been  proposed  in  the  literature  (Kelly,  1985,  compares  five  of 
them).  Likewise,  the  diameter  of  the  retinal  receptive  fields  was  presumed  to  be  minimum  at 
the  central  fovea  and  to  increase  with  eccentricity  at  a  linear  rate;  however,  other  eccentricity 
functions  can  be  substituted  for  this  linear  one. 

We  plan  to  use  this  capability  to  make  small,  quantitative  changes  in  the  hmctions  and 
parameters  of  various  parts  of  the  model  when  we  come  to  fit  the  parts  together.  This  in  turn 
should  help  to  reveal  where  large,  qualitative  changes  may  be  required — a  main  goal  of  this 
project 

C.  Continuous  Approximations 

Ultimately  the  discrete  sampling  of  the  visual  field,  as  performed  by  retinal,  LGN,  and 
cortical  cells,  and  their  interconnections  (e.g.,  Sakitt  and  Barlow,  1982),  constitutes  a  crucial 
aspect  of  our  model.  However,  a  few  important  issues  are  relatively  independent  of  retinocort- 
ical  sampling.  These  include 

(1)  The  form  of  the  spatial-filter  kernels  and  scaling  functions  used  to  simulate  retinal 
inhomogeneity. 

(2)  The  question  of  how  to  simulate  the  effects  of  temporal  changes  if  we  don’t  model 
them  directly:  this  and  Question  (1)  are  closely  related. 


3 


We  also  looked  at  an  additional  question: 

(3)  How  many  independent  fixations  does  it  take  to  extract  adequate  information  from  a 
natural  scene? 

These  questions  can  be  addressed  by  using  a  quasi-continuous  approach  in  which  we  filter 
an  input  image  at  every  pixel  (so  that  the  sampling  limit  is  governed  only  by  the  resolution  of 
the  digital  image).  Our  results  with  this  procedure  are  reported  in  the  following  three 
subsections. 


I.  Inhomogeneous  Spatial  Filtering 

A  LISP  program  was  written  that  convolves  a  high-resolution  input  image  with  a  specified 
kernel.  The  size  of  the  kernel  is  minimum  at  a  selected  fixation  point  [such  as  the  one  indi¬ 
cated  in  Figure  2(a)],  and  it  expands  linearly  with  distance  from  this  point,  at  a  specifiable  rate. 
The  appearance  of  the  inhomogeneously  filtered  image  depends  on  the  form  chosen  for  the  ker¬ 
nel.  Figure  2(b)  shows  the  output  image  obtained  with  a  low-pass  (gaussian)  filter.  The  stan¬ 
dard  deviation  of  the  gaussian  kernel  in  this  example  varied  by  a  factor  of  about  4  from  the 
fixation  point  to  the  edge  of  the  image. 

We  have  produced  filtered  outputs  for  several  (interior  and  exterior)  scenes  with  three 
types  of  kernels;  low-pass,  bandpass,  and  low-pass/bandpass  mixtures.  Of  these,  the  low- 
pass/bandpass  mixture  seems  to  be  the  most  realistic,  for  reasons  discussed  in  the  following 
subsection. 


2.  Temporal  Effects 

Our  goal  is  to  construct  a  model  that  concentrates  on  the  spatial  aspects  of  retinocortical 
processing.  However,  temporal  modulation  of  the  retinal  input,  by  eye  movements  and  object 
motion,  has  important  effects  on  spatial  vision.  (Even  when  the  eye  is  fixated  on  one  point  in 
a  stationary  scene,  involuntary  drift  motions  and  microsaccades  modulate  the  retinal  input)  In 
order  to  make  our  results  as  realistic  as  possible,  we  must  therefore  modify  the  “purely  spa¬ 
tial"  responses  of  the  model  to  take  account  of  the  effects  of  temporal  modulation.  Apparently 
these  effects  have  not  been  incorporated  in  previous  modeling  attempts.  The  rationale  for  our 
procedures  is  as  foDows. 

Responses  to  moving  or  flickering  sine-wave  gratings  show  a  well-known  form  of  recipro¬ 
city  between  spatial  and  temporal  frequetKy  responses  (Kelly,  1979).  The  spatial  frequency 
response  shifts  from  bandpass  to  low-pass  with  increasing  temporal  frequency,  and  the  tem¬ 
poral  frequency  response  shifts  from  barxlpass  to  low-pass  with  increasing  spatial  frequency. 
Moreover,  stabilized  images  disappear,  which  implies  that  the  temporal  dc  response  of  vision  is 
zero. 

Is  the  spatial  dc  response  also  zero,  as  in  a  Marr-Hildreth  filter  [or  any  of  the  other 
receptive-field  filter  futKtions  compared  in  Kelly,  (1985)]?  When  this  type  of  spatial  filtering  is 


4 


(a)  INPUT,  WITH  ARROW  SHOWING  FIXATION  POINT  FOR  FILTERED  IMAGE 


(bl  EFFECT  OF  LOW-PASS.  INHOMOGENEOUS  FILTER 


FIGURE  2  INTERIOR  SCENE:  ORIGINAL  AND  EXAMPLE 
OF  INHOMOGENEOUS  FILTERING 


used,  the  resulting  images  contain  only  (bright  and  daric)  edges  on  an  otherwise  blank  Cgray) 
background  (see  Figure  3).  However,  such  effects  never  appear  subjectively,  nor  in  the  phy- 
I  siological  outputs  of  retinal  ganglion  cells. 

At  zero  temporal  frequency,  we  don’t  see  perfectly  differentiated  spatial  information 
because  we  don’t  see  anything  (stabilized  disappearance).  However,  at  temporal  frequencies 
above  zero,  the  spatial  dc  response  is  also  greater  than  zero,  increasing  with  increasing  tem¬ 
poral  frequency  as  indicated  above.  (At  and  above  the  peak  temporal  frequency,  the  spatial 
I  response  becomes  purely  low-pass,  just  as  the  temporal  response  does  above  the  peak  spatial 

frequency.) 

These  psychophysical  results  are  consistent  with  physiological  properties  of  retinal  gan¬ 
glion  cells  (Frishman  et  al.,  1987).  If  the  ganglion-cell  output  is  linear,  then  the  zero  dc 
I  response  is  presumably  the  result  of  balanced  antagonism  between  center  and  surround.  But 

*  the  frequency  response  of  the  surround  falls  off  before  that  of  the  center,  so  thw  latter 
predominates  with  increasing  frequency — either  spatial  or  temporal.  (If  ganglion-cell  outputs 
are  rectified,  then  we  must  consider  the  balance  between  on-center  and  off-center  types,  but  the 
principle  is  the  same.) 

These  considerations  support  our  basic  assumption  about  the  kind  of  spatial  filtering  that 
controls  normal  vision.  We  assume  that  the  filter  kernel  is  a  mixture  of  bandpass  and  low-pass 
filtering,  depending  on  the  temporal  frequencies  present  For  a  stationary  scene,  these  temporal 
frequencies  are  entirely  due  to  eye  movements,  but  they  cover  as  wide  a  range  as  the  spatial 
frequencies  in  the  scene  (because  temporal  frequency  equals  spatial  frequency  times  the  eye- 
I  movement  velocity,  which  is  the  same  everywhere  in  the  image). 

Note  that  this  constitutes  a  kind  of  adaptive  filter  in  the  space  domain.  If  the  target  is 
simply  a  stationary  (but  unstabilized)  sine-wave  grating,  then  the  bandwidth  of  the  temporal 
input  is  as  low  as  natural  eye  movements  permit,  and  result  is  as  close  to  Marr-Hildreth  filter¬ 
ing  as  an  unstabilized  retinal  image  can  evoke.  But  the  more  complicated  the  scene,  the 
I  broader  this  temporal  bandwidth  becomes,  with  its  higher  frequencies  evoking  purely  low-pass 

spatial  responses. 

Nevertheless,  a  dilute  form  of  spatial  differentiation  does  occur  at  low  temporal  frequen¬ 
cies,  producing  some  degree  of  edge  enhanceraenL  We  have  simulated  these  temporal  effects 
by  adding  a  low-pass  spatial  component  to  the  bandpass  process  of  Figure  3.  Figure  4  shows 

•  the  results  of  such  a  combination  process.  In  this  example  the  combined  filter  is  weighted  4:1 
in  favor  of  the  bandpass  process.  Although  pure  spatial  differentiation  is  unacceptable,  human 
subjects  accept  quite  a  lot  of  excess  edge-contrast  or  “overshoot”  in  an  image  without  noticing 
anything  unusual.  (This  effect  was  discovered  more  than  30  years  ago  during  experiments 
with  the  “synthetic  highs”  coding  system.  Perhaps  some  modem  experiments  to  quantify  it 

-  are  in  order.) 

We  believe  we  have  simulated  the  effects  of  temporal  modulation  on  spatial  processing 
reasonably  well  by  adding  a  low-pass  section  to  the  standard  bandpass  component,  but  that 
does  not  diminish  the  importance  of  the  spatial  differentiation  component.  This  can  be  seen  by 
comparing  the  results  of  Figure  4  with  those  of  Figure  2(b)  [or  with  the  Kelly  and  Pentland 
j  (1985)  pilot  study],  which  merely  used  low-pass  filtering. 


6 


3.  Multiple  Fixations 

Much  information  is  lost  in  the  periphery  of  the  visual  field  when  only  a  single  fixation  is 
considered  (as  in  Figures  3  and  4).  To  get  some  idea  of  how  much  of  this  loss  may  be 
restored  by  combining  the  information  from  a  few  well-chosen  fixations,  we  performed  the 
computational  experiments  described  below.  Although  these  experiments  do  not  exactly  simu¬ 
late  anything  the  visual  process  does  (as  far  as  we  know),  they  are  useful  in  assessing  the 
information  loss. 

The  visual  system  must  have  appropriate  information  about  how  eye  and  head  movements 
affect  the  retinal  coordinates  of  external  objects,  in  order  to  compensate  for  these  motions  in 
such  a  way  that  the  perceived  environment  stands  still.  (That  information  is  known  as  the 
“outflow”  signal.)  But  there  is  no  machinery  distal  to  the  cortex  that  could  use  this  signal  to 
translate  successive  glimpses  and  superimpose  them.  That  must  occur  at  a  location  proximal  to 
VI,  where  the  representation  has  already  undergone  significant  geometrical  distortion  (as  dis¬ 
cussed  in  Part  D  of  this  section). 

Nevertheless  we  can  justify  translating  and  superimposing  in  retinal  coordinates  the  infor¬ 
mation  from  different  fixations  on  a  test  basis,  if  only  to  estimate  the  number  of  fixations  that 
might  typically  be  needed  by  the  post-striate  machinery  that  does  combine  these  successive 
glimpses  into  a  stable  percept.  (Before  we  try  to  determine  how  such  a  stable  percept  might 
actually  be  constructed,  we  need  to  answer  the  questions  raised  in  Parts  D  and  E  about  more 
distal  parts  of  the  system.) 

Figure  5  shows  an  example  of  the  information  available  from  only  three  fixaaons  in  the 
interior  scene  of  Figure  2(a).  One  of  these  is  on  the  man’s  face  (as  in  Figure.  2(b),  3,  and  4], 
one  is  on  the  woman’s  face,  and  one  on  the  open  book.  Retinal  inhomogeneity  was  simulated 
by  inhomogeneously  filtering  each  of  the  component  images  about  its  particular  fixation  point, 
using  the  4:1  bandpass/low-pass  ratio  illustrated  in  Figure  4.  We  made  similar  tests  with  other 
scenes,  including  a  more  clunered  interior  scene  and  the  exterior  of  a  building  with  many 
straight-line  segments. 

In  order  to  treat  the  results  as  upper  bounds  on  the  information  obtainable  from  multiple 
fixations,  we  gave  considerable  thought  to  devising  an  algorithm  for  combining  these  images 
that  would  not,  in  itself,  cause  any  significant  information  loss.  Each  fixation-image  was  first 
filtered  into  a  set  of  spatial-frequency  bands  by  the  so-called  laplacian-pyramid  technique, 
which  can  seamlessly  blend  quite  different  images  together  (Burt  and  Adelson,  1983).  The 
filtered  images  within  each  band  were  then  combined  pixel-by-pixel,  according  to  a  fonnula 
with  the  following  properties: 

•  If  all  pixel  intensities  were  equal,  it  returned  their  mean. 

•  If  only  one  intensity  was  non-zero,  it  returned  the  sum. 

•  For  intermediate  intensity  distributions,  the  output  varied  smoothly  between  these 
extremes. 

The  resulting  (pyramid)  images  were  then  combined  to  obtain  u.:  final  result  (such  as  Figure 
5). 


9 


FIGURE  5  COMBINATION  OF  FIGURE  4  WITH  TWO  OTHER  INHOMOGENEOUSLY  FILTERED  IMAGES, 
REPRESENTING  THE  THREE  DIFFERENT  FIXATIONS  IN  THE  SAME  SCENE 

See  explanation  in  text. 


We  conclude  from  these  studies  that  a  remarkably  small  number  of  well-chosen  fixations 
may  contain  enough  infoimation  to  produce  a  sharp  impression  of  a  large  scene.  Four  fixation 
points  seemed  adequate  for  the  most  complicrued  images  we  tried. 


D.  Retinocortical  Mapping  Functions 

A  retinocortical  mapping  fimction  maps  a  given  point  in  the  retina  to  a  corresponding 
point  in  the  striate  cortex  (VI):  thus,  it  completely  describes  the  geometrical  distortion  of  the 
cortical  representation  of  the  visual  field.  By  way  of  example.  Figure  6  shows  an  approximate 
cortical  representation  of  the  scene  used  in  Figures  2  through  4.  Our  model  should  include  a 
mapping  function  that  matches  its  physiological  counterpart  as  accurately  as  possible  without 
too  much  mathematical  complexity. 

The  local  change  in  cortical  distance  for  a  given  increment  of  retinal  distance  is  called  the 
retinocortical  magnification-,  in  the  model,  this  magnification  factor  is  given  by  the  absolute 
value  of  the  differential  of  the  mapping  function.  Except  at  the  fovea,  the  retinocortical 
magnification  is  known  to  vary  approximately  as  1/R.  where  R  is  eccentricity  in  the  visual 
field. 


If  it  were  exactly  true  that  the  magnification  varied  inversely  with  eccentricity,  then  the 
mapping  function  could  have  a  particularly  simple  form,  in  which  polar  coordinates  at  the 
retina  (R.  6)  are  mapped  to  Cartesian  coordinates  (u.v)  at  the  cortex,  by  the  function 

u  =  log  R 
v  =  0  . 


An  elegant  way  of  describing  this  relation,  introduced  into  visual  physiology  by  Schwartz 
(1977).  involves  complex-variable  theory.  Let  z  =  x  +  iy  =  Re'®  at  the  retina,  and  w  =  u  +  iv 
at  the  cortex.  Then  the  idealized  mapping  function  is  simply  w  =  log  z  ==  log  R  +  iO.  and 
the  magnification  is  Idw/dzI.  which  equals  1/R.  This  is  a  conformal  (angle-preserving) 
transformation. 

Unfortunately,  this  mapping  function  has  a  physically  unrealizable  singularity  at  R  =  0. 
To  fit  the  cortical  topography  in  primates  and  other  species.  Schwartz  (1980)  adds  a  real  con¬ 
stant.  c.  to  the  argument,  making  w  =  log  (z  +  c).  This  gets  rid  of  the  singularity,  but  destroys 
the  circular  symmetry  of  the  differential  and  hence  of  the  magnification  factor  as  defined 
above:  The  constant  forces  the  retinocortical  magnification  to  be  greater  in  the  vertical  meridian 
than  in  the  horizontal  one. 

Estimates  of  human  retinocortical  magnification  do  not  show  perfect  circular  symmetry 
either  (particularly  at  large  eccentricities),  but  the  departure  is  in  the  other  direction  firom  that 
predicted  by  log(w-K):  Horizontal  magnification  is  greater  than  vertical  (Rovamo  and  Virsu. 
1979).  Recent  studies  in  the  macaque  also  bear  on  this  question  (Tootell  et  al..  1982);  depend¬ 
ing  on  how  they  are  interpreted,  the  macaque  data  may  either  support  or  contradict  the  human 
magnification  estimates  (Sakitt  and  Barlow,  1982;  Letelier  and  Varela,  1984;  Schwartz,  1985). 

Moreover,  the  horizontal-to-vertical  magnification  ratio  increases  monotonically  with 
eccentricity  in  the  Rovamo  and  Virsu  (1979)  estimates,  but  in  the  Schwartz  mapping  function. 


11 


FIGURE  6  EXAMPLE  OF  A  DISTORTED  CORTICAL  REPRESENTATION,  BASED 
ON  THE  SAME  SCENE  AND  FIXATION  POINT  AS  FIGURE  2(a) 


this  ratio  inflects  at  eccentricity  c.  approaching  unity  again  at  greater  eccentricities.  Various 
m^yping  functions  that  are  otherwise  consistent  with  the  known  physiology  also  have  this 
discrepancy;  the  problem  is  not  yet  resolved. 


E.  Retinal  Ganglion-Cell  Spacing  and  Receptive-Field  Sizes 

Once  we  adopt  a  magnification  function,  then  we  can  connect  a  particular  retinal  ganglion 
cell  to  a  particular  conical  location.  In  order  to  take  this  anatomical  approadi,  we  must  con¬ 
sider  the  spacing  between  these  discrete  connections  and  the  oveiliq)  (positive  or  negative)  of 
their  receptive  fields. 

A  look  at  two  straightforward  assumptions  (neither  origiiuil  with  us)  may  simplify  the 
model  at  this  stage: 

•  We  could  assume  uniform  sampling  in  Cartesian  coordinates  (u.v)  at  VI.  Given  a 
specific  mapping  function,  this  assumption  completely  determines  the  sampling  distribu¬ 
tion  at  the  retina. 

•  We  could  postulate  a  polar-coordinate  (R,  6)  sampling  distribution  at  the  retina.  If  we 
assume  uniform  sampling  in  angle  (0)  and  any  radial  sampling  distribution,  fiien  the 
mapping  function  will  completely  determine  the  sampling  distribution  at  file  cortex. 

It  would  be  ^^lealing  to  try  to  combine  the  postulates  of  uniform  samffiing  at  the  cortex  and 
circularly  uniform  sampling  at  the  retina;  however,  this  combination  is  incompatible  with  the 
complex,  log  (z  -f  c)  mapping  fiinctiorL 

Log-polar  mapping  Oog  R  or  log  z)  combined  with  uniform,  Cartesian  cortical  sampling 
dictates  that  the  centers  of  retinal  receptive  fields  be  located  at  exponentially  increasing  eccen¬ 
tricities  along  meridians.  In  this  case  it  can  be  shown  that,  if  these  fields  are  circular  and  their 
diameter  is  proportional  to  their  spacing,  fiiey  will  overlap  by  a  constant  fraction  of  their  area. 
However,  the  receptive  fields  would  then  become  infinitesimally  small  as  R  approaches  0, 
which  is  of  course  not  realistic  (the  center  of  a  foveal  receptive  field  must  always  contain  at 
least  one  cone  cell). 

If  we  eliminate  the  singularity  at  R  =  0  [by  writing  u  =  log  (R+c)  or  W  =  log  (z+c)],  the 
receptive-field  locations  are  still  approximately  exponential  outside  the  fovea,  and  their  diame¬ 
ters  are  minimum  at  R  =  0.  In  this  case,  however,  the  proportion  of  overlap  is  no  longer  con¬ 
stant  Receptive-field  overlap  becomes  maximum  at  the  fovea,  decreasing  steadily  with 
increasing  eccentricity.  Hence,  there  may  be  no  overlap  in  the  periphery. 


F.  Summary  and  Futitre  Plans 

Several  important  questions  raised  in  the  preceding  sections  need  to  be  settled  before  we 
can  proceed  further  with  implementing  our  model.  Currently  we  are  assessing  alternative  pos¬ 
tulate  and  the  restrictions  they  imply  for  various  propertie  of  the  visual  system.  It  is  already 
apparent  that  we  will  be  forced  to  modify  traditional  models  of  some  parts  of  the  system  if  we 
are  to  fit  them  into  a  consistent  picture  of  the  whole. 


13 


Far  from  being  a  “hang-up,”  this  is  a  gratifying  situation  that  confirms  the  value  of  doing 
the  present  kind  of  study.  We  were  aware  of  the  difficulties  of  constructing  a  stable  percept 
out  of  geometrically  distorted  cortical  representations,  but  we  did  not  strongly  expect  to 
uncover  other  significant  irKX>nsistencies,  especially  in  the  well-studied  details  of  retin^  sam¬ 
pling  and  cortical  projection.  Having  arrived  at  this  point  in  the  first  year  of  the  project,  for¬ 
tunately,  we  plan  to  pursue  these  inconsistencies  to  the  best  of  our  ability. 


IV  INTERACTIONS  WITH  SCIENTinC  COMMUNITY 


The  PI  has  been  infonnally  coifening  on  the  subject  of  this  study  with  visual  scientists 
and  other  peers  for  several  years.  Most  of  these  talks  occur  at  the  ARVO  and  OSA  meetings, 
which  he  regularly  attends.  There  are  also  informal  exchanges  between  our  vision  group  at 
SRI  and  other  local  vision  groups,  such  as  those  at  Stamp'd,  Berkeley,  and  Smith-Kettlewefl. 

Particularly  relevant  to  the  present  report  have  been  two  occasions  on  which  this  project 
was  discussed  with  Dr.  Eric  Schwartz  of  NYU  during  the  past  year.  Dr.  Kelly  and  Dr. 
Schwartz  met  for  this  purpose  during  the  AFOSR  Vision  Meeting  at  Armapolis  in  December. 
They  talked  about  each  other’s  work  and  agreed  to  try  to  meet  again.  In  April  1988,  Dr. 
Smith  and  Dr.  Schwartz  met  at  the  SPIE  meeting  in  Orlando,  and  discussed  various  questions 
about  modeling  the  visual  system.  We  plan  to  foster  this  common  involvement  during  the  next 
two  years  of  this  project 

Also  at  the  SPIE  meeting.  Dr.  Richard  Juday  of  Johnson  Space  Center  presented  his 
results  on  an  image  sensor  whose  output  is  tte  log-polar  transform  of  the  sensed  space  (see 
Section  n-D).  Tlus  appears  to  be  a  device  that  would,  for  the  first  time,  allow  rapid  experi¬ 
mentation  with  various  “cortical”  algorithms.  In  discussion  with  Dr.  Smith,  Dr.  Juday 
expressed  interest  in  having  other  projects,  such  as  ours,  make  use  of  the  device.  We  will  pur¬ 
sue  this  connection  to  determine  whether  it  could  be  of  benefit  to  our  study. 

Because  this  is  the  first  year  of  a  new  projea  (and  a  new  direction  for  the  PI),  we  have 
not  yet  generated  formal  publications  or  oral  presentations  (other  than  at  Armapolis)  that  could 
be  credited  to  this  study.  We  expect  that  some  of  the  toincs  treated  above  can  be  developed 
for  publication  during  the  second  year. 


REFERENCES 


Burt  P.J.,  and  E.H.  Adelson,  1983:  “A  Multiresolution  Spline  with  Application  to  Image 
Mosaics,”  ACM  Trans,  on  Graphics.  Vol.  2,  No.  4,  pp.  217-236  (October). 

Frishman,  L.,  A.W.  Freeman,  J.B.  Troy,  D.E.  Schweitzer-Tong,  and  C.  Enroth-Cugell,  1987: 
“Spatiotemporal  Frequency  Responses  of  Cat  Retinal  Ganglion  Cells,”  J.  Gen.  Physiol., 
Vol.  89,  pp.  599-628. 

Kelly,  D.H.,  1979:  "Motion  and  Vision,  n.  Stabilized  Spatio-Temporal  Threshold  Surface,”  J. 
Opt.  Soc.  Am.,  Vol.  69,  k>.  1340-1349. 

Kelly,  D.H.,  1985:  “Retinal  Inhomogeneity.  HI.  Circular-Retina  Theory,”  J.  Opt.  Soc.  Am., 
Vol.  A2,  pp.  810-819. 

Kelly  D.H.,  and  A.P.  Pentland,  1985:  “Why  We  See  the  Whole  World  Sharply:  Eye  Move¬ 
ments,  Retinal  Inhomogeneity  and  Cortical  Processing,"  Unpublish^  Report,  SRI 
IntetnationaL 

Letelier  J.C.,  and  F.  Varela,  1984:  “Why  the  Cortical  Magnification  Factor  in  Rhesus  is  Isotro¬ 
pic,”  Vision  Res.,  Vol.  24,  pp.  1091-1095. 

Marr  D.,  and  E.  Hildreth,  1980:  “Theory  of  Edge  Detection,”  Proc.  Royal  Soc.  London  Ser., 
Vol.  B,  No.  290,  pp.  199-218. 


Rovamo  I.,  and  V.  Virsu,  1979:  “An  Estimation  and  Application  of  the  Human  Cortical 
Magnification  Factor,”  Exp.  Brain  Res.,  Vol.  37,  pp.  495-510. 

Sakitt  B.,  and  H.  Barlow,  1982:  “A  Model  for  the  Economical  Encoding  of  the  Visual  Image 
in  the  Cerebral  Cortex,”  Biol.  Cybern.,  Vol.  43,  pp.  97-108. 

Schwartz,  E.,  1977:  “Spatial  Mailing  in  the  Primate  Sensory  Projection:  Analytic  Structure 
and  RelevatKC  to  Perception,”  Biol.  Cybern.,  Vol.  25,  pp.  181-194. 

Schwartz,  E.,  1980:  “Computational  Anatomy  aivd  FutKtional  Architecture  of  Striate  Cortex: 
A  Spatial  Mtqiping  Approach  to  Perceptual  Coding,”  Vision  Res.,  Vol.  8,  pp.  645-669. 

Schwartz,  E.,  1985:  “On  the  Mathematical  Stnicture  of  the  Visuotopic  Mapping  of  Macaque 
Striate  Cortex,”  Science,  Vol.  227,  pp.  1065-1066. 

Tootell,  R.B.H.,  M.S.  Silverman,  E.  Switkcs,  and  R.L.  De  Valois,  1982:  “Deoxyglucose 
Analysis  of  Retinotopic  Organization  in  Primate  Striate  Cortex,”  Science,  Vol.  218,  pp. 
902-904. 


17 


